A polynomial invariant for a new class of phylogenetic networks

Invariants for complicated objects such as those arising in phylogenetics, whether they are invariants as matrices, polynomials, or other mathematical structures, are important tools for distinguishing and working with such objects. In this paper, we generalize a complete polynomial invariant on trees to a class of phylogenetic networks called separable networks, which will include orchard networks. Networks are becoming increasingly important for their ability to represent reticulation events, such as hybridization, in evolutionary history. We provide a function from the space of internally multi-labelled phylogenetic networks, a more generic graph structure than phylogenetic networks where the reticulations are also labelled, to a polynomial ring. We prove that the separability condition allows us to characterize, via the polynomial, the phylogenetic networks with the same number of leaves and same number of reticulations by considering their internally labelled versions. While the invariant for trees is a polynomial in Z[x1,…,xn,y] where n is the number of leaves, the invariant for internally multi-labelled phylogenetic networks is an element of Z[x1,…,xn,λ1,…,λr,y], where r is the number of reticulations in the network. When the networks are considered without leaf labels the number of variables reduces to r + 2.


Introduction
A complete polynomial invariant able to uniquely distinguish between rooted trees has been recently introduced in [1]. Motivated to analyze and compare tree shapes in a phylogenetic context, this polynomial (to which we will refer as the Liu polynomial) has been used both to define a similarity measure on rooted tree shapes and to estimate parameters and models via its coefficients [2]. Moreover, its generalization from trees to networks (by analyzing the set of embedded spanning trees in the network) has also been used to study the properties of randomly generated networks [3].
We note that the word "invariant" is used here in its traditional sense, and not the one used in algebraic geometry approaches to phylogenetics, in which phylogenetic invariants for an evolutionary model along a tree are the polynomials which vanish on the expected frequencies of base patterns at the leaves [4]. Throughout this article, a (complete) invariant of a set A is a A multitude of (non-polynomial) invariants have been defined for specific subclasses of phylogenetic networks. To name just a few, the μ-vectors which store the number of paths from nodes to leaves characterize (among others) tree-child networks [5] and orchard networks (without stacks) [6]; the set of displayed trees that characterizes regular networks [7]; and the induced trinets (minimal subnetworks induced by triples of leaves) that characterize (among others) level-2 networks [8] and orchard networks [9].
In this paper we show how a polynomial invariant can be defined for rooted phylogenetic networks, generalizing the Liu polynomial invariant for trees. In order to do so, we consider phylogenetic networks and a labelled version of them, called internally labelled phylogenetic networks, where we keep the labels on leaves and also (bijectively) label the reticulations. In fact, internally labelled phylogenetic networks are a subset of a more general set of networks, which we call internally multi-labelled phylogenetic networks, or IMLN's. On these networks the presence of elementary nodes is allowed, and leaves, reticulation and elementary nodes are all labelled. Then, if we denote by PN the set of all phylogenetic networks (up to isomorphism) and by ILPN the set of all internally labelled phylogenetic networks (up to isomorphism), the map F: ILPN ! PN that sends each internally labelled phylogenetic network to the phylogenetic network obtained by "forgetting" all the internal labels (on reticulations) is obviously well defined; therefore for each N 2 PN, F −1 (N) is the set of all the internally labelled phylogenetic networks that have its same topology; its fiber, in mathematical terms.
The aim of this paper is to define a polynomial p that uniquely characterizes these fibers and, in so doing, also characterizes the phylogenetic networks beneath them. See the diagram below. Since F is not injective, the dashed arrows denote maps that are not unique. We will see that, in general, p is not injective, but that it will be so under a suitable topological condition. This paper is organized as follows. In the Methods section we include the three main graph structures of study: phylogenetic networks, internally labelled phylogenetic networks and internally multi-labelled phylogenetic networks (or IMLN's). We also define the concept of isomorphism on these structures. The Results section is divided into two main subsections. The first one studies a process that unfolds an IMLN into a tree (an IMLT) and its reverse, folding, that recovers the initial IMLN. The key result of this section is the characterization of an IMLN by an IMLT (Corollary 10). The second subsection is dedicated to the definition and study of an extension of the Liu polynomial on IMLN's. If N is an IMLN on a set of leaves labelled by X, the assigned polynomial p(N) has |X| + r + 1 variables, where r is the number of reticulations in the network. This subsection is further divided into multiple parts. The first part studies a special type of path (composed only of reticulations or elementary nodes) in IMLN's, called strong paths. Roughly speaking, these allow us to define an equivalence relation between IMLN's, and we prove that two IMLN's share the polynomial if, and only if, they are equivalent (Theorem 15). The second part gives a sufficient condition on the space of phylogenetic networks (which we call separability) for the derived internally labelled phylogenetic networks to be completely characterized by the polynomial. The multiple lemmas proved in this part allow us to prove the main result (Theorem 22) in the third part; that is, the polynomial is a complete invariant in the set of internally labelled separable phylogenetic networks up to isomorphism. The fourth part of this subsection proves that orchard networks are separable, and so are characterized by the polynomial introduced in this paper (Theorem 23). Finally, in the last part, we present how the obtained results can be applied for an unlabelled version of networks, in the sense that we forget the labelling of the leaves, reducing the polynomial to r + 2 variables (Proposition 24). This paper finishes with a section of Discussion and Conclusion.

Methods
In this section we introduce the mathematical notation that will be used in the rest of the paper.
Throughout this paper, X will denote a non-empty finite set (of taxa). Commonly, we will use X = {x 1 , . . ., x n }, and we will allow ourselves to see each member of X as an irreducible polynomial in Z½x 1 ; . . . ; x n �; i.e., we will consider the labels of the leaves in our networks to be polynomials of the form x i for i 2 {1, . . ., n}. Definition 1. A rooted binary phylogenetic network N = (V, E) on X, or simply a phylogenetic network on X, is a rooted directed acyclic graph with no parallel arcs satisfying the following conditions: if the labelling ℓ on reticulations is suppressed. Also, if we consider a phylogenetic network and we add a labelling bijection ℓ: R(N) ! {λ 1 , . . ., λ r }, it becomes an IMLN. In order to reflect this possibility, we introduce the following definition. Definition 4. An internally labelled phylogenetic network N on X is an IMLN on X without elementary nodes and where the maps φ: L(N) ! X and ℓ: R(N) ! {λ 1 , . . ., λ r } are bijections.
In order to formally define the concept of isomorphism between a pair of phylogenetic networks or between a pair of IMLN's, we consider the alternative notation, (V, E, φ) and (V, E, φ, ℓ), to reflect the labelling functions, respectively. Definition 5. Two phylogenetic networks N 1 = (V 1 , E 1 , φ 1 ) and That is, a graph isomorphism that preserves the labels of both the reticulation and elementary nodes.

Folding and unfolding
Following [10], a phylogenetic network can be "unfolded" in a specific manner to obtain a multi-labelled tree, that is a particular IMLT without elementary nodes in terms of the previous definitions. Moreover, in some cases, this process can be reverted, and the multi-labelled tree can be "folded" recovering the initial network. A phylogenetic network cannot in general be characterized by a multi-labelled tree, and this correspondence is valid only for the subclass of FU-stable phylogenetic networks [10].
In this subsection, however, we prove that an internally labelled phylogenetic network can be uniquely characterized by an IMLT obtained by a sequence of "unfoldings" on its reticulation nodes. Roughly speaking, considering the reticulations of an IMLN in a specific order, it is possible to sequentially duplicate the subnetwork descending from these nodes until an IMLT is obtained.
Let N be a (generic) IMLN, and R(N) the set of its reticulation nodes. The relation of being a descendant of another node induces a partial order over R(N), which we will denote by � R . That is, for any two nodes u, v 2 R(N), u � R v if, and only if, there exists a directed path from v to u. Let R min (N) be the set of the minimal elements of R(N) under this order, i.e. reticulation nodes such that none of their descendants are also reticulation nodes. Lemma 1. Let N be an IMLN and u 2 R min (N). Then the graph rooted at u is an IMLT. Proof. If u 2 R min (N), then there is no path in N from u to another reticulation. This means that there are no reticulations in the graph rooted at u; and therefore it is an IMLT.
Let N be an IMLN, and consider u 2 R min (N) (so that u is labelled by an element in {λ 1 , . . ., λ r }). Let v 1 , v 2 be its parents, noting that v 1 6 ¼ v 2 due to the fact that parallel arcs are excluded. Define U(N, u) to be the unfolded IMLN of N at u, obtained by the following algorithm: Remark 1. Notice that the process of unfolding preserves paths in the following sense: if N 0 is obtained from N by unfolding N at some node u, then any path between two nodes in N 0 comes from an existing path in N; and vice versa, any path between two nodes in N corresponds to a path in N 0 . Notice, however, that a path in N might very well correspond to two different paths in N 0 , and so this assignation is not injective. Corollary 2. Let N be an IMLN, and u 2 R min (N). Then U(N, u) is an IMLN. Let N be an IMLN. We say that a sequence (u 1 , . . ., u k ) of nodes in R(N) is compatible if the associated sequence ðN; N u 1 ; N u 2 ; . . . ; N u k Þ of IMLN's is such that u iþ1 2 R min ðN u i Þ and u 1 there is no path from u i to u j when j > i; i.e., it is non decreasing under the partial order � R induced by the network over R(N). Lemma 3. Let N be an IMLN and u 1 , u 2 2 R min (N). Then, Proof. It is straightforward by Lemma 1 and the steps of the unfolding algorithm. If u 1 2 R min (N), then u 2 2 R min (U(N, u 1 )); otherwise there would be a reticulation node u 0 in R(U(N, u 1 )) and a path from u 2 to u 0 in U(N, u 1 ), and so in N, which is a contradiction. Then, by Lemma 1, the graph rooted at u 2 in U(N, u 1 ) is an IMLT. Since u 2 is not a node in any of the copies of the IMLT rooted at u 1 in the construction of U(N, u 1 ), there is no intersection between the copies from u 1 and the copies from u 2 . Since the same argument holds if we start by u 2 , the result is achieved.
Lemma 3 can be extended following the same arguments for any set of reticulations {u 1 , . . ., u k } if all of them are in R min (N), since there will be no intersection between the created copies of IMLT's.
Let N be an IMLN. We define an equivalence relation � in the set of compatible sequences of elements of R(N) as follows: . . . ; u k Þ � ðv 1 ; v 2 ; . . . ; v k 0 Þ , fu 1 ; u 2 ; . . . ; u k g ¼ fv 1 ; v 2 ; . . . ; v k 0 g: That is, we say that two compatible sequences are equivalent if they are composed by the same set of nodes.
An � R -chain in an IMLN N is a chain under the � R order defined on R(N) (or a subset of it). That is, a subset of reticulations such that u 1 � R � � � � R u s . And, an � R -antichain in an IMLN N is an antichain under the � R order; i.e., a subset of reticulations of N which are pairwise incompatible (u i ≰ R u j and u j ≰ R u i if u i 6 ¼ u j ) under the � R order.
In the next lemma we prove that if we consider an � R -chain in an IMLN N then there is a single way to traverse these nodes in a compatible sequence, from bottom to top. On the other hand, if we consider an � R -antichain, then every way to traverse these nodes is valid to form a compatible sequence.
(b). If S is an � R -antichain, then every possible ordering of its nodes produces a compatible sequence composed by S.
Proof. We first prove (a).
Therefore if there exists a path from v i to v j , it produces a cycle in N; but this is not possible because N is an IMLN, and so in particular it is acyclic. This means that there is no path from v i to v j when i < j. Consequently, if i < j, v i must precede v j in every compatible sequence containing S. Now we prove (b). Let v and v 0 be two nodes in S. If v precedes v 0 in a sequence there cannot be a path from v to v 0 ; otherwise v 0 � R v. If v 0 precedes v in a sequence there cannot be a path from v 0 to v; otherwise v � R v 0 . Since S is an � R -antichain, then both cases derive compatible sequences.
Corollary 5. Let N be an IMLN and (u 1 , u 2 , . . ., u k ) � (v 1 , v 2 , . . ., v k ) a pair of equivalent compatible sequences of elements of R(N). Let ðN; N u 1 ; N u 2 ; . . . ; N u k Þ and ðN; N 0 v 1 ; N 0 v 2 ; . . . ; N 0 v k Þ be the associated sequences of IMLN's to their corresponding compatible sequences. Then N u k and N 0 v k are isomorphic. Proof. For k = 1 there is nothing to prove, since u 1 = v 1 . For k = 2. If u 1 , u 2 2 R min (N), there is nothing to prove, because (u 1 , u 2 ) and (u 2 , u 1 ) are compatible sequences and Lemma 3 applies. If (u 1 , u 2 ) is a compatible sequence and u 1 � R u 2 , then must be (v 1 , v 2 ) = (u 1 , u 2 ) (and not (v 1 , v 2 ) = (u 2 , u 1 )), since u 1 = 2R min ðN u 2 Þ. The general situation for k � 3 demands a different approach. Let s 1 = (u 1 , u 2 , . . ., u k ) and . ., u k }. Then we could iteratively apply the following process to prove the result. Let A 0 = {u 2 A: u 2 R min (N)}. Note that A 0 is not empty due to u 1 and v 1 (which could be equal) are in R min (N). Then, let s A 0 1 be the sequence obtained from s 1 by moving all the nodes in A 0 to the first positions (in such a way that if u i , u j 2 A 0 with i < j, then the node u i appears before u j in s A 0 1 ) and remain invariant the rest of nodes. Note that s A 0 1 is compatible by construction and s A 0 1 � s 1 . A similar process can be repeated to obtain s A 0 2 � s 2 . Note that the set of nodes of A 0 occupying the first |A 0 | positions in both s A 0 1 and s A 0 2 are exactly the same, and it is an � R -antichain; but these nodes may not appear in the same order in both sequences.
Let u � be the last node (the rightmost) in ble equivalent sequence to s A 0 2 obtained by remaining invariant all positions except for the node u � , which comes to be the last node inŝ 2 A 0 with u � 2 A 0 . This ensures that the last node of the first |A 0 | positions in both s A 0 1 andŝ 2 A 0 is the same, u � . Note that, could be u � = u k = v k (when A = A 0 ). By Lemma 4(b) and Lemma 3, the IMLN N u � obtained by sequentially unfold at the nodes in s A 0 1 until u � is achieved, is isomorphic to the IMLN obtained by sequentially unfold at the (same) nodes inŝ 2 A 0 until u � is achieved. Then, the same process can be repeated by considering new equivalent compatible sequences obtained from s A 0 1 andŝ 2 A 0 by suppressing the first | A 0 | positions and starting with the IMLN N u � . Therefore, given a compatible sequence (u 1 , u 2 , . . ., u r ) of all the elements of R(N), and its associated sequence ðN; N u 1 ; N u 2 ; . . . ; N u r Þ, we define the unfolding of an IMLN N, denoted by U(N), by means of the equation UðNÞ ¼ N u r . We may refer to such a sequence as a sequence of unfoldings. See Fig 1 for an example of a sequence of unfoldings for an IMLN; in fact for an internally labelled phylogenetic network. Now, we are interested in the "reverse" process to unfolding. Roughly speaking, we are interested in formally defining a way to "fold" an IMLT to recover the IMLN from which it comes. We can, given an IMLN N, also define a partial order over the set of elementary nodes E(N) by saying that for any two and a directed path from v to u. We call the set of elementary nodes that are maximal under this order E max (N). Lemma 6. Let ðN; N u 1 ; N u 2 ; . . . ; N u r Þ be a sequence of unfoldings of an internally labelled phylogenetic network N. For any N u i in it and for every u 2 E max ðN u i Þ, there exists exactly another v 2 E max ðN u i Þ such that ℓ(u) = ℓ(v) and the IMLT's N u i ðuÞ and N u i ðvÞ are isomorphic.
such that there is a path from w to u. By Remark 1 this path is preserved in every N u j with j < i. Since the labelling function ℓ is injective over reticulation nodes and N has not elementary nodes, this means that the pair w, w 0 corresponds to a reticulation node in some N u j with j < i; equivalently, this is a reticulation node equal to some u j with j < i. This leads to a contradiction with the fact that the sequence (u 1 , u 2 , . . ., u r ) is compatible. If we consider a maximal element in N u i different to the two coming from the duplication of u i in N 0 , the previous argument can be reproduced similarly. These pair of maximal elements are preserved as maximal in every N u j with j < i right up until the unfolding on this reticulation is produced.
This proves that the IMLT's rooted on the corresponding copies of it are also preserved until N u i is reached.
In particular, in the proof of Lemma 6, and following the same notation, we show that the node u i is maximal under the � E order in N u i . Notice also that this could be false if elementary nodes are allowed in the initial IMLN N.
Proof. We begin by the "if" direction. If v, w are such that v � R w when seen as reticulation nodes in N, there exists at least a path from w to v. Now, since w 2 E max ðN u i Þ, by Lemma 6, there exists w 0 2 E max ðN u i Þ such that ℓ(w) = ℓ(w 0 ) and N u i ðwÞ ¼ N u i ðw 0 Þ, via an isomorphism f.
Notice that N is an internally labelled phylogenetic network. The three figures below are the sequence of unfoldings ðN u 2 ; N u 3 ; N u 1 Þ associated to the compatible sequence of reticulations (u 2 , u 3 , u 1 ). Following the introduced terminology, https://doi.org/10.1371/journal.pone.0268181.g001

PLOS ONE
Then, since by hypothesis v 2 EðN u i Þ and, by Remark 1, the path from w to v in N is preserved in N u i , there exist paths from w to v and from there exists a path from w to v and a path from w 0 to f(v) and ℓ(v) = ℓ(f(v)). Now, since there are no elementary nodes in N, there must exist j < i such that in N u j (it could be that N u j ¼ N), the nodes v and w are reticulations. By Remark 1, this implies that there would exist a path from w to v in N u j , and therefore v � R w in N u j , and so in N. Thus concludes the proof.
Given N an IMLN, u 2 R min (N) and U(N, u), we would like to consider N to be the result of a folding operation over U(N, u): N = F(U (N, u), u), for some suitable F. For any unfolding sequence ðN; N u 1 ; N u 2 ; . . . ; N u r Þ, we say that each of its members is a (phylogenetic) pseudo-network -in particular, they are IMLN's. Equivalently, we can define a pseudo-network recursively as follows: let N be an IMLN; it is a pseudo-network if it satisfies the following three conditions: (i). no reticulation node descends from an elementary node; (iii). for any u 2 E max (N), the IMLN obtained by the process of , as well as the edge (v (1) , v); 3. adding the arc (v (1) , u), is also a pseudo-network. The IMLN obtained by the process described in (iii) is denoted by F (N, u), and called the Lemma 8. Let N be a pseudo-network and u 2 R min (N). Then, FðUðN; uÞ; uÞ ¼ N: is duplicated in the unfolding process, u and a new copy of it, say v, are elementary nodes and the roots of N 0 (u) and N 0 (v) respectively, such that By definition of the folding process of N 0 at u, the IMLT N 0 (v) and also the arc (v 2 , v) are deleted and a new arc (v 2 , u) is created. This results in a reticulation node u with parents v 1 and v 2 which is the root of N 0 (u). Since Given N an IMLN and ðN; N u 1 ; N u 2 ; . . . ; N u r Þ a sequence of unfoldings, by Lemma 8 we have that N u i ¼ FðN u iþ1 ; u iþ1 Þ and that N ¼ FðN u 1 ; u 1 Þ. Therefore, we derive the following result. Note that, similarly as we have done by the equivalent compatible sequences, there is not a unique way to recover the IMLN N by applying a set of foldings.
If N is a pseudo-network we know that it is the product of a sequence of unfoldings performed over an IMLN, N 0 . We can then rewrite Corollary 9, by defining a function F from the set of pseudo-networks to the set of IMLN's by F(N) ≔ N 0 . Hence, Corollary 10. Let N be an internally labelled phylogenetic network. Then This result is the analogue of the concept of stable networks in Section 4 of [10]. The key difference here is that we allow elementary nodes.

A polynomial for internally multi-labelled phylogenetic networks
Given a phylogenetic network N on X, one can obtain a rooted tree by removing one incident arc to each reticulation node. These (sub)trees could contain elementary nodes, and its leaves might be labelled in X (the leaves from N) and other sets different from it (for instance when the single outgoing arc to a reticulation is removed). Those trees become unrooted if the direction of the arcs is suppressed (particularly, the root becomes a degree two node) and are called embedded spanning trees if its set of leaves is exactly X. Tree-child phylogenetic networks are characterized by their set of embedded spanning trees [11], but not general phylogenetic networks.
In [3], the Liu polynomial is generalized to phylogenetic networks by their sets of embedded spanning trees. Roughly speaking, the polynomial of the network is the product of the polynomials of the embedded spanning trees (considering trees with multiplicity). Consequently, this extension is a complete invariant for tree-child networks.
There are some natural extensions of the Liu polynomial to IMLN's that come to mind. The first one, for internally labelled phylogenetic networks, is to completely unfold such a network and, from any elementary node u labelled λ i , for some i 2 {1, . . ., r} and labels λ i distinguishable from labels x i , grow an arc to a new node v, label v as λ i , and finally forget the labelling of u. Thus, the unfolded IMLT becomes a multi-labelled tree over leaves {x 1 , . . ., x n , λ 1 , . . ., λ r }. See an example of that decomposition in Fig 2 from the internally labelled phylogenetic network N depicted in Fig 1. By means of Corollary 3.5 in [1], this extension of the polynomial is immediately seen to uniquely characterize an internally labelled phylogenetic network.
We will here deal with a natural extension that reflects the reticulation process in the sheer morphology of the polynomial, rather than in the name of the variables.
Then, let ρ N be the root of N; we define p(N) to be p(ρ N ). Notice that this definition of the polynomial p is given over generic IMLN's.
For example, the polynomial associated to the IMLN represented in Fig 1 is Proof. If u is not a tree node the polynomial will not be irreducible, since then there would exist v 2 V(N) as the only descendant of u, and p(u) = ℓ(u)p(v).
It then remains only to see that if u is a tree node, p(u) is irreducible. In this case, either u is a leaf and then p(u) = φ(u) = x i for some i 2 {1, . . ., n} and so irreducible, or u has two children and p(u) = y + Λp(w 1 )p(w 2 ), where Λ is a product of λ i from λ 1 , . . ., λ r , and w 1 , w 2 are the first descendants from u at each side that are tree nodes (they are possibly equal). Now consider the polynomial p 0 (u) obtained from p(u) by changing every variable x 1 , . . ., x n , λ 1 , . . ., λ r for, say, x 1 . Then, it can be seen that p 0 (u) satisfies Eisenstein's irreducibility criterion in Z½y�½x 1 � (which is an unique factorization domain, UFD) applied to the ideal hyi, and so p(u) is irreducible when seen as a polynomial in Z½y�½x 1 ; . . . ; x n ; l 1 ; . . . ; l r �. But, since y does not divide p(u), then p(u) is also irreducible in Z½x 1 ; . . . ; x n ; l 1 ; . . . ; l r ; y�.
The next proposition will show that the polynomial is conserved throughout a sequence of unfoldings, and therefore will allow us to compute it over any of its members without distinction. In particular, it can be computed on the unfolding of the network. Proposition 12. Let N be an IMLN, and ðN; N u 1 ; N u 2 ; . . . ; N u r Þ be a sequence of unfoldings. Then, pðNÞ ¼ pðN u 1 Þ and, for any i 2 {1, . . ., r − 1}, pðN u iþ1 Þ ¼ pðN u i Þ.
Proof. Let N 0 be an IMLN, and u 2 R min (N 0 ). If we are able to show that p(N 0 ) = p(U (N 0 , u)), then the proposition will hold. Let v (1) , v (2) be the parents of u, in U(N 0 , u) each of them will be

PLOS ONE
the parent of at least one elementary node u x , x 2 {1, 2}, which will be the root of a copy of the IMLT N 0 (u), and by construction p(u 1 ) = p(u 2 ) = p(u) = p(N 0 (u)). Now, by the definition of the polynomial, p(v (x) ) will be the same in N 0 and in U (N 0 , u). Therefore, p(N 0 ) = p (U(N 0 , u)).
We now introduce two remarks, the first concerning the interpretation of the coefficients and, the second, about the reconstruction of the unfolding of an IMLN from the polynomial if it characterizes the IMLN.
Remark 2. The interpretation of the coefficients of the polynomial p(N) can be extended from Lemma 2.4 in [1] by slightly modifying the definition of primary subtrees to the IMLT T = U(N). Let a primary subtree S of T be a rooted subtree of T such that S shares the same root node with T and any leaf node in T is either a leaf node in S or a descendant of a leaf node in S which does not come from an elementary node.
Then, if we represent p(N) as X cðg 1 ; . . . ; g r ; a 1 ; . . . ; a n ; bÞl g 1 1 � � � l g r r x a 1 1 � � � x a n n y b ; each one of its coefficients counts the number of primary subtrees of U(N) satisfying that: • γ i (for i 2 {1, . . ., r}) is the number of nodes labelled by λ i of these subtrees; • α i (for i 2 {1, . . ., n}) is the number of leaf nodes labelled by x i of these subtrees which are also leaves in U(N); • β is the number of leaf nodes of these subtrees which are internal nodes in U(N). Fig 1. Notice that these primary subtrees can then be folded into a sort of "sub-primary networks". Remark 3. In this remark we shall give a first approximation to the problem of reconstructing the Newick string of an IMLT U(N) from p(N), in the case where the polynomial characterizes N. Roughly speaking, we proceed as follows: start by substracting y from p(N) and then factor p(N) − y = q 1 � q 2 . Then the Newick string to consider is (q 1 , q 2 ). From now on, whenever it is possible to substract y from a polynomial q, do so. If the factorization involves only two members, q = q 1 � q 2 , then proceed as before and replace q by (q 1 , q 2 ). Otherwise, there could be conflicts in terms of deciding how to group members in a factorization of type Y

PLOS ONE
where q k are polynomials. But there will always be in the queue of factorizations pending to be grouped, a pair of them where a "minimum" monomial of type λ i � q s is common in both; this allows one to determine that there is an arc from an elementary node labelled by λ i to the subtree determined by the polynomial q s . In terms of the Newick string, it could be replaced by (λ i (q s )).
We are now specially interested in determining under which conditions the polynomial associated to an IMLN uniquely characterizes it. Note that this is not always the case, indeed for IMLT's. See for instance the three representations of IMLT's in Fig 4. The polynomial fails to correctly distinguish between them. Roughly speaking, looking at the polynomials of the elementary vertices we could readily distinguish between the three possibilities, but we cannot do so by only looking at p(u), since p(u) = y + λ 1 λ 2 p(w 1 )p(w 2 ).
Strong paths. We shall now present a series of definitions. Let N be an IMLN, and u, v 2 V(N). If there exists a path from u to v consisting only of elementary or reticulation nodes, we say that u is a strong ancestor of v, and that v is a strong descendant of u. Such a path is called a strong path. For example, by considering the situation in Fig 4, we can see that in all three cases w 1 , w 2 strongly descend from u. Lemma 13. Let N be an internally labelled phylogenetic network, and v 1 , v 2 two reticulation nodes. If p(v 1 ) = p(v 2 ), then v 1 = v 2 .
Proof. Let w 1 be the child of v 1 ; by the definition of the polynomial, p(v 1 )/p(w 1 ) = λ i for some λ i 2 {λ 1 , . . ., λ r }. Since p(v 1 ) = p(v 2 ), it also means that p(v 2 )/p(w 1 ) = λ i , but since N is an internally labelled phylogenetic network this implies that v 2 is a parent of w 1 and that ℓ(v 2 ) = λ i . Thus, they are the same node.

Lemma 14. Let N be an internally labelled phylogenetic network, and v a reticulation node in it. A node u is a strong ancestor of v if, and only if, one of the two following conditions happens: • p(v) | p(u), that is p(v) divides p(u), and then u is a reticulation node, or
• p(v) | (p(u) − y), and then u is a tree node.
Proof. By the definition of the polynomial and Lemma 13. Now, if we want to compare two IMLN's on the same sets of labels {x 1 , . . ., x n } and {λ 1 , . . ., λ r }, we should take into account the possibility that two of them are isomorphic up to a permutation of the labels. In order to express this possibility, let σ: {x 1 , . . ., x n , λ 1 , . . ., λ r } ! {x 1 , . . ., x n , λ 1 , . . ., λ r } be a permutation such that σ(X) = X (i.e., that fixes the sets of labels of the leaves and of the elementary or reticulation nodes). Given an IMLN N, we denote by σ N the network isomorphic to N that has all its labels permuted according to σ, and by σ p(N) we mean p( σ N) or, equivalently, the polynomial that has all its variables changed according to σ. Definition 7. Let N 1 , N 2 be two IMLN's, and σ a permutation of their labels such that σ(X) = X. We say that N 1 and N 2 are equivalent modulo strong paths if the following three conditions are satisfied: 3. for any tree node u in N 1 , p(u) = σ p(f(u)).
Being equivalent modulo strong paths is an equivalence relation. Remark 4. The above definition can also be easily stated exclusively in terms of strong paths, which are intrinsic to the IMLN. However, the definition in terms of the polynomial is more tractable and concise.
Notice that all the IMLT's in Fig 4 are equivalent modulo strong paths. Indeed, we present the following theorem: Theorem 15. Let N 1 , N 2 be two IMLN's, and σ a permutation of their labels such that σ(X) = X. Then, p(N 1 ) = σ p(N 2 ) if, and only if, N 1 and N 2 are equivalent modulo strong paths.
Proof. The "if" part of the implication is direct by the first condition of the definition of equivalence modulo strong paths.
Suppose now that p(N 1 ) = σ p(N 2 ), and let us show that N 1 and N 2 must be equivalent. We first see that there exists a bijection f between the sets of tree nodes of N 1 and N 2 such that for any tree node u in N 1 , p(u) = σ p(f(u)). We will use the following inductive schema: we shall prove that, if u is a tree node in N 1 and f(u 1 ) is a tree node in N 2 such that p(u) = σ p(f(u)), then if w 1 , w 2 in N 1 are the two tree nodes that strongly descend from u 1 , then the two tree nodes w 0 1 ; w 0 2 that strongly descend from f(u) in N 2 are such that pðw 1 Þ ¼ s pðw 0 1 Þ and pðw 2 Þ ¼ s pðw 0 2 Þ. Then, we will provide tree nodes u 1 , u 2 in N 1 and N 2 , respectively, from which all other tree nodes will descend and such that p(u 1 ) = σ p(u 2 ).
Let u be a tree node in N 1 , and w 1 , w 2 be the two tree nodes that strongly descend from it.
are the tree nodes that strongly descend from f(u) in N 2 ; but since p(w 1 ), p(w 2 ) are both irreducible and different from any λ i , then it must happen that (without loss of generality) pðw 1 Þ ¼ s pðw 0 1 Þ and pðw 2 Þ ¼ s pðw 0 2 Þ. Thus, set f ðw 1 Þ ¼ w 0 1 and f ðw 2 Þ ¼ w 0 2 . We will now show that there is a tree node in both N 1 and N 2 such that any other tree node descends from it. Suppose that the root of N 1 , say ρ 1 , is a tree node; if so, since p(N 1 ) = σ p(N 2 ) and by Proposition 11, the root of N 2 , say ρ 2 , must also be a tree node. Therefore, any other tree node in their respective IMLN's must descend from them, and furthermore p(ρ 1 ) = σ p(ρ 2 ). Set f(ρ 1 ) = ρ 2 .
Finally, suppose that ρ 1 is not a tree node; then, p(ρ 1 ) is not an irreducible polynomial, and therefore neither will σ p(ρ 2 ). Let w 1 be the only tree node strongly descending from ρ 1 in N 1 . It is straightforward to see that, if w 0 1 is the only tree node strongly descending from ρ 2 in N 2 , then pðw 1 Þ ¼ s pðw 0 1 Þ. In both cases, any other tree node in the network will descend from them. Therefore, set f ðw 1 Þ ¼ w 0 1 . Now, the question arises: under which conditions can we say that two internally labelled phylogenetic networks that are equivalent modulo strong paths are actually isomorphic? Separability: A sufficient condition. In this part we shall give a sufficient condition for two internally labelled phylogenetic networks to be completely characterized by the polynomial. In order to do so, we will work with the immediate neighbourhood of any tree node.
Let N be a phylogenetic network, and let u be a tree node in N. Let w 1 , w 2 be the two (possibly equal) tree nodes that strongly descend from it. Let v 1 ; . . . ; v r 1 ; . . . ; v r 1 þr 2 be the reticulation nodes in the strong paths from u to w 1 and w 2 , and suppose that there are r 1 such nodes in the path from u to w 1 and r 2 in the other. See Fig 5. Let U(u) = {u 1 , . . ., u k } be the set of all the tree nodes that are strong ancestors of w 1 or w 2 different from u. Note that the node u i in Fig 5  (left) is a node in U(u). In what follows, we will allow ourselves to write U if the context is sufficiently clear. We will present now the following lemma.
Lemma 16. Consider the situation above. Let v be a reticulation node from the collection v 1 ; . . . ; v r 1 þr 2 . Then, there are two possibilities: • both its parents are nodes from v 1 ; . . . ; v r 1 þr 2 , or • there exists at least one tree node u i 2 U such that there is a strong path from u i to v not containing any other reticulation node v 1 ; . . . ; v r 1 þr 2 .
Furthermore, the first possibility can only happen for one reticulation node in v 1 ; . . . ; v r 1 þr 2 , and it will hold if, and only if, w 1 = w 2 .
Proof. Suppose that v is the first reticulation node (counting by proximity to u) that satisfies the first condition (this makes sense, since our networks are binary). In this situation, from it emerges only one path up to the next tree node. But since N is binary, the two paths that emerged from u are now confounded in the only path from v to the next tree node, w 1 = w 2 . See Fig 5, right. Therefore, since there is now only one path of reticulation nodes, no other node in it can satisfy the first condition.
If v does not satisfy the first condition, one of its parents must not be from v 1 ; . . . ; v r 1 þr 2 . Let u i be a tree node strong ancestor of such a parent of v. The pair v, u i satisfies the second condition. See Fig 5, left. We say that a tree node u i 2 U(u) enters the neighbourhood of u at v if the pair v, u i satisfies the second condition of Lemma 16. If the context is sufficiently clear, we shall only say that it enters at v. Likewise, we say that v is the entry of u i to the neighbourhood of u (or that it is just its entry).
We can then divide the set U into five sets: let v (x) , x 2 {1, 2}, be the two children of u, then we define U ðxÞ 1 ¼ fu i 2 U : u i enters the neighbourhood of u at only one reticulation node v that isastrong descendant of v ðxÞ g; U ðxÞ 2 ¼ fu i 2 U : u i enters the neighbourhood of u at twoðpossibly equalÞ reticulation nodes v 1 ; v 2 that are strong descendants of v ðxÞ g;

PLOS ONE
Notice that, if w 1 6 ¼ w 2 , then U 3 ¼ fu i 2 U : u i is a strong ancestor of both w 1 and w 2 g: The above division fU ð1Þ 1 ; U ð2Þ 1 ; U ð1Þ 2 ; U ð2Þ 2 ; U 3 g is a partition of U. In Fig 6 three tree nodes u 1 , u 2 and u 3 from the set U = U(u) are represented. Note that u 1 2 U ð1Þ 1 , u 2 2 U ð2Þ 2 and u 3 2 U 3 .
In general, given all the polynomials evaluated at each tree node of U, we cannot deduce the exact configuration of the v i 's. Remember, for instance, for the case where r 1 + r 2 = 2, the three situations presented in Fig 4. That is, we had no a priori information on which v i were strong ancestors of w 1 and which of w 2 . This fact motivates the following definition. Definition 8. Let N be a phylogenetic network and u a tree node in it. Let v (x) , x 2 {1, 2}, be the two children of u. We say that u is separable if either v (1) and v (2) are tree nodes, or if there exists a tree node u 1 different from u such that it satisfies one of the following conditions: • is a strong ancestor of v (1) (or v (2) ) but not of any other strong descendant of u, or • is a strong ancestor of v (1) (or v (2) ) and of one of its strong descendants.

Remark 5.
In this case, the negative definition might be more intuitive. Let u be a tree node with w 1 and w 2 the tree nodes strongly descended from u. Then u is not separable if none of its two children v (1) and v (2) are tree nodes, and • if w 1 6 ¼ w 2 , all the strong ancestors of v (1) , v (2) that are not u are in U 3 (u), or • if w 1 = w 2 and v is the first reticulation node that is a strong descendant of both v (1) and v (2) , then any strong ancestor of v (1) that is not u will be a strong ancestor of a reticulation node in the strong path from v (2) to v, and vice versa. A phylogenetic network is called separable if all its tree nodes are so. Remark 6 Notice that separability is a completely topological condition. Thus, we will use it indistinguishably for phylogenetic networks and internally labelled phylogenetic networks.
The key point in separability is that given u a separable tree node and all the polynomials of the tree nodes that are strong ancestors of w 1 and w 2 , we can actually identify the polynomial p(u 1 ) of the tree node that satisfies the conditions of the definition, and thus we can identify which reticulation nodes descend from v (1) and which from v (2) . Indeed: if w 1 6 ¼ w 2 , p(u 1 ) will be such that p(w 1 ) divides p(u 1 ) − y but p(w 2 ) does not, and contains the largest number of λ 1 , . . ., λ r dividing p(u) − y. If w 1 = w 2 , the argument is analogous using pðw 1 Þ 2 ∤pðu 1 Þ À y. As a result, we are able to deduce that pðv ðxÞ Þ ¼ m ðxÞ 1 . . . m ðxÞ r x pðw x Þ, x 2 {1, 2}, for m ðxÞ 1 . . . m ðxÞ r x dividing p(u) − y. Thus, we are able to "separate" p(u) into the contributions from p(v (1) ) and p(v (2) ). Fig 7 depicts two sub-networks which can be part of internally labelled phylogenetic networks (and then part of the underlying phylogenetic networks) that are not separable. Notice that they are not separable at any of the nodes u 1 , u 2 , u 3 . The filled triangle and non-filled triangle pendant at w 1 and w 2 represent non-isomorphic sub-networks (for example a leaf and a cherry). Note that in both cases we have the same polynomials at u i , namely p(u 1 ) = y + λ 1 λ 2 λ 3 p(w 1 )p(w 2 ), p(u 2 ) = y + λ 1 λ 2 λ 3 λ 4 p(w 1 )p(w 2 ) and p(u 3 ) = y + λ 1 λ 2 λ 4 p(w 1 )p(w 2 ). Thus, we can not distinguish between the sub-networks when looking at p(u 1 ), p(u 2 ), p(u 3 ).
Lemma 17. Let N be an internally labelled phylogenetic network, and u 1 a tree node in it such that it is one of the deepest tree node (i.e., one for which exists path of maximal length from the root to it) satisfying the following condition: there exists another tree node u 2 such that p(u 1 ) = p (u 2 ). Then, u 1 and u 2 must have the same set of children.
Proof. If u 1 is a leaf, there is nothing to prove, because all the leaves have a different label. Then if p(u 1 ) = p(u 2 ), and p(u 1 ) = φ(u 1 ), we must have u 2 = u 1 . In the other case, let v (1) , v (2) be the two children of u 1 ; since p(v (1) ) and p(v (2) ) both divide p(u 2 ) − y and are unique (because u 1 is one of the deepest node satisfying the condition in the statement of the lemma), u 2 is a strong ancestor to both of them. Therefore, v (1) , v (2) must be reticulation nodes.
We write where w 1 , w 2 are the tree nodes that strongly descend from u 1 , pðv ðxÞ Þ ¼ m From v (x) to w x there is only one strong path of length r x , and since u 2 is a strong ancestor of both v (1) and v (2) there are r 1 + r 2 polynomials λ 1 , . . ., λ r that divide p(u 2 ) − y. But these are exactly the number of polynomials in λ 1 , . . ., λ r that must divide p(u 2 ) − y, since p(u 1 ) = p(u 2 ). Lemma 18. Let N be an internally labelled separable phylogenetic network, and u 1 , u 2 two internal nodes in it. Then, p(u 1 ) = p(u 2 ) if, and only if, u 1 = u 2 .
Proof. The "if" part is trivial by the definition of the polynomial. By Lemma 13, if either u 1 , or u 2 is a reticulation node, the result is proven. Therefore, assume that u 1 , u 2 are both tree nodes, and suppose, for the sake of contradiction, that u 1 6 ¼ u 2 . Furthermore, assume that u 1 is one of the deepest nodes satisfying that p(u 1 ) = p(u 2 ).
By Lemma 17, their sets of children are the same. Let v 1 , v 2 be the two children of u 1 and u 2 . Then u 1 and u 2 are the only strong ancestors of both v 1 and v 2 . Moreover, u 2 is in U 3 (u 1 ). This means that u 1 is not separable and, therefore, neither is N.
Corollary 19. If N is a separable phylogenetic network, then there is no pair of tree vertices with the same set of children.
Note that the other direction of the implication in the above Corollary is false. See for instance the (internally labelled) phylogenetic subnetworks depicted in Fig 7. These are non separable and they have different set of children for every pair of tree nodes.
Isomorphism of internally labelled phylogenetic networks. In this part we prove the main theorem of this paper. It roughly says that the polynomial is a complete invariant for the class of internally labelled separable phylogenetic networks up to equivalence modulo strong paths.
Lemma 20. Let N 1 , N 2 be two internally labelled phylogenetic networks such that, for any u 1 , If there exists a permutation σ of their labels with σ(X) = X such that p(u) = σ p(f(u)) for any u 2 V(N 1 ), then f is an isomorphism of internally labelled phylogenetic networks.
Proof. In order to ease the notation, and without loss of generality, let us assume that σ is the identity. The fact that f is a bijection is already required in the statement of the Lemma. Then, we must prove that if (u, v) 2 E(N 1 ), then (f(u), f(v)) 2 E(N 2 ) and that f preserves the labels.
Suppose that u is a reticulation node; if (u, v) 2 E(N 1 ), then p(u) = λ i p(v) for some λ i 2 {λ 1 , . . ., λ r }. Therefore, p(f(u)) = λ i p(f(v)) which, since p(f(v)) is unique for f (v), implies that f (v) is the only child of f(u) (which is a reticulation node since p(f(u)) is not irreducible).
Suppose now that u is a tree node, and let v 1 , v 2 be its two children. Then, we know that p(v x ) = p(f(v x )) for x 2 {1, 2}, and that p(f(u)) = y + p(f(v 1 ))p(f(v 2 )). Since each node is uniquely characterized by its polynomial, it means that both f(v 1 ) and f(v 2 ) are strong descendants of f(u). By an argument analogous to that in the proof of Lemma 17, we can deduce that f(v 1 ) and f(v 2 ) are actually the children of f(u). Now, we prove that f preserves the labels on the leaves and on the reticulations. If u 2 L(N 1 ), then f(u)2L(N 2 ). Since u 2 L(N 1 ), by definition, p(u) = φ 1 (u). Moreover, p(u) = p(f(u)) because leaves are tree nodes. Since f(u) 2 L(N 2 ), p(f(u)) = φ 2 (f(u)). Then, φ 1 (u) = φ 2 (f(u)). Now, let u 2 R(N 1 ) (a reticulation on N 1 ). By definition, p(u) = ℓ 1 (u)p(v), where v is the single child of u. We have seen above that p(f(u)) = ℓ 1 (u)p(f(v)); but, since f(u) is a reticulation in N 2 and f(v) is its single child, by definition, p(f(u)) = ℓ 2 (f(u))p(f(v)). Then, ℓ 1 (u) = ℓ 2 (f(u)).
Theorem 21. Let N 1 , N 2 be two internally labelled separable phylogenetic networks. If they are equivalent modulo strong paths, then they are isomorphic.
Proof. By Lemma 18, if N 1 and N 2 are separable, then p(u 1 ) = p(u 2 ) implies u 1 = u 2 for any internal node in either N 1 or N 2 . Then, if we are able to find a bijection f between the sets of nodes satisfying the premises of Lemma 20, we will be able to apply it and show the result. Now, for any v� reticulation node strongly descending from either v ð1Þ � or v ð2Þ � , any of its strong ancestors that are tree nodes are such that there exists a tree node in N 1 with its same polynomial (and thus, is a strong ancestor of some v strongly descending from u). Therefore, we will have that p(v) = p(v�), and we can then set f(v) = v�.
Theorem 15 and Theorem 21 together imply the following main result. Theorem 22. Let N 1 , N 2 be two internally labelled separable phylogenetic networks, and σ a permutation of their labels such that σ(X) = X. If p(N 1 ) = σ p(N 2 ), then N 1 and N 2 are isomorphic.
Orchard networks. In this subsection we prove that the phylogenetic networks in the class of orchard networks [12] are separable. These (strictly) include tree-child networks.
Before we recall the definition of orchard networks, we need to introduce some definitions. Let N be a phylogenetic network on X. Let {a, b} � X . The set {a, b} is a cherry of N if a and b  share a parent. Let p a and p b the parents of a and b, respectively. If p b is a reticulation and (p a ,  p b ) is an arc in N, then {a, b} is a reticulated cherry of N.
Let N be a phylogenetic network and let {a, b} be a cherry of N. Then "reduce b" is the operation of deleting b and suppressing the resulting elementary node. If p a = p b is the root of N, then delete b and the root. If {a, b} is a reticulated cherry of N in which p b is the reticulation, "cut {a, b}" is the operation of deleting (p a , p b ), and suppressing the two resulting elementary nodes. For both operations, we say that a cherry-reduction is performed on N.
Let N be a phylogenetic network. The sequence N = N 0 , N 1 , . . ., N k of phylogenetic networks is a cherry-reduction sequence of N if, for all i 2 {1, . . ., k}, the phylogenetic network N i is obtained from N i−1 by a (single) cherry-reduction. Then, a phylogenetic network N is orchard if there exists a cherry-reduction sequence N = N 0 , N 1 , . . ., N k of N such that N k consists of a single vertex.
Theorem 23. Orchard networks are separable. Proof. Let N be an orchard network and let N = N 1 , . . ., N k be a sequence of cherry-reductions of N. We prove that, for any i 2 {1, . . ., k − 1}, if N i is not separable, then N i+1 is not either. This means that if N is not separable, the last network in every cherry-reduction sequence cannot be a single vertex, reaching a contradiction due to N being orchard.
If a reduction of a leaf in a cherry is produced there is nothing to prove because it does not involve reticulation nodes. Then suppose that a cut of a reticulated cherry {a, b} is produced in N i . Let p a and p b the parents of a and b, respectively, and let p b the reticulation node. Then p a is a tree node. Moreover p a is a separable node in N i because the single strong descendant that is a reticulation node of p a is p b . Then, N i is not separable due to some other tree node.
Notice that the cut of the reticulated cherry {a, b} does not change the relation of strong descendance in the remaining nodes; i.e., u, v were such that v strongly descended from u in N i if, and only if, the correspondent nodes in N i+1 satisfy this condition too. More precisely, let u be a non separable tree node, v (1) , v (2) its children and w 1 , w 2 the tree nodes that strongly descend from it. By Remark 5 this means that, to begin with, neither v (1) nor v (2) are tree nodes and, if w 1 6 ¼ w 2 , all the strong ancestors of v (1) , v (2) that are not u are in U 3 (u). Now, p a can never be in U 3 (u) because one of its children is a leaf, a. Therefore, the cut of the reticulated cherry {a, b} would not affect the non separability of u. Suppose now that w 1 = w 2 . By Remark 5, if v is the first reticulation node that is strong descendant of both v (1) , v (2) , the reticulation node p b cannot be in the strong paths from v (1) to v and from v (2) to v (note also that must be p b 6 ¼ v). Then, both strong paths remain untouched to the cut of the reticulated cherry and also the set of strong ancestors of v (1) and v (2) that cause the non separability of u. Therefore, any non separable tree node in N i continues to be so in N i+1 .
Unlabelled version. Throughout this paper we have not made any use of the different labels of the leaves of an IMLN, and so the arguments could be translated, mutatis mutandis, to IMLN's whose leaves are not labelled (although internal labels would still be necessary), modelled by labelling all leaves using a single variable x, to give a polynomial in Z½x; l 1 ; . . . ; l r ; y�. Again, for the case of phylogenetic networks, this would require that given two unlabelled phylogenetic networks we consider internally labelled phylogenetic networks with the same topology. This leads to the following proposition: Proposition 24. Let N 1 , N 2 be two internally labelled separable phylogenetic networks whose leaves are all labelled by x. Then, p(N 1 ) = p(N 2 ) implies that N 1 and N 2 are isomorphic.

Discussion and conclusion
In this paper a new complete polynomial invariant for a class of (binary) phylogenetic networks, that of separable networks, is introduced. It generalizes results in both [2] for phylogenetic trees and in [3] for phylogenetic networks where their set of embedded spanning trees (like tree-child) characterizes it. The introduced polynomial p is a generalization of the Liu polynomial and it is defined in a more generic structure of networks, called IMLN's, where the reticulations are also labelled with labels other than those on the leaves. In contrast to [3], we compute the polynomial directly over the IMLN, and we avoid to previously compute its set of spanning trees. We prove that for the case of separable phylogenetic networks, the internally labelled structure derived from those is completely characterized by the polynomial. This induces a complete polynomial invariant for separable phylogenetic networks. That is, given two separable phylogenetic networks N 1 and N 2 on X, we could fix an internally labelled phylogenetic network from it, say N � 1 , by bijectively labelling the reticulations. Then, if we consider all possible internally labelled phylogenetic networks obtained from N 2 by the permutation of all its variables, X and the reticulations, we can compare pðN � 1 Þ with the polynomial of all the networks obtained from N 2 . Note that, due to Proposition 24, we could avoid the permutation of the labels on X, reducing the cost of this computation.
Establishing a complete polynomial invariant for phylogenetic networks opens the door to several interesting opportunities for exploration, such as new ways to define metrics on networks, fast methods to distinguish networks, and possibly ways to extract important features of a network by examining this polynomial. To this end, it may be helpful to understand whether a particular polynomial is derived from a network or not (for clearly not all irreducible polynomials give networks).
Furthermore, the computation of p(N) here may be performed reticulation-by-reticulation for some network classes, eg orchard networks [12]. That is, suppose that N is an internally labelled phylogenetic network derived from an orchard network and N = N 0 , N 1 , . . ., N k is a complete cherry reduction sequence of N (that is N k is a single node). We can perform an assignment of polynomials to all leaves in every intermediate IMLN N j . Finally, p(N) is the polynomial assigned to the single node in N k . Start by assigning p(u) = φ(u), for every leaf u in N 0 . Then, let {v 1 , v 2 } be the two leaves involved in the cherry-reduction to move from N j to N j +1 and let p(v i ) be the polynomial assigned to v i in N j for i 2 {1, 2}. Then, • if {v 1 , v 2 } is a cherry, assign to the resulting leaf in N j+1 the polynomial y + p(v 1 )p(v 2 ).
• if {v 1 , v 2 } is a reticulated cherry (being v 2 the child of the reticulation labelled by λ i ), assign to the resulting leaf in N j+1 coming from the parent of v 1 the polynomial y + λ i p(v 1 )p(v 2 ), and to the resulting leaf in N j+1 coming from the parent of v 2 , the polynomial λ i p(v 2 ).
It would be interesting to investigate more optimisations for general or for specific subclasses of phylogenetic networks.
It would also be interesting to think about ways to reduce the complexity of the polynomial assigned to a network; even at the expense of a loss of the uniqueness of this assignment. One